Corpus Statistics Approaches to Discriminating Among Near-Synonyms
نویسندگان
چکیده
Near-synonyms are words that mean approximately the same thing, and which tend to be assigned to the same leaf in ontologies such as WordNet. However, they can differ from each other subtly in both meaning and usage—consider the pair of near-synonyms frugal and stingy— and therefore choosing the appropriate near-synonym for a given context is not a trivial problem. Early work on near-synonyms was that of Edmonds (1997). Edmonds reported an experiment attempting to predict which of a set of near-synonyms would be used in a given context using lexical co-occurrence networks. The conclusion of this work was that corpus statistics approaches did not appear to work well for this type of problem and led instead to the development of machine learning approaches over lexical resources such as Choose the Right Word (Hayakawa, 1994). Our hypothesis is that some kind of corpus statistics approach may still be effective in some situations: particularly if the nearsynonyms differ in sentiment from each other. Intuition based on work in sentiment analysis suggests that if the distribution of words embodying some characteristic of sentiment can predict the overall sentiment or attitude of a document, perhaps these same words can predict the choice of an individual ‘attitudinal’ nearsynonym given its context, while this is not necessarily true for other types of nearsynonym. This would again open up problems involving this type of near-synonym to corpus statistics methods. As a first step, then, we investigate whether attitudinal near-synonyms are more likely to be successfully predicted by a corpus statistics method than other types. In this paper we present a larger-scale experiment based on Edmonds (1997), and show that attitudinal near-synonyms can in fact be predicted more accurately using corpus statistics methods.
منابع مشابه
Exploring Approaches to Discriminating among Near-Synonyms
Near-synonyms are words that mean approximately the same thing, and which tend to be assigned to the same leaf in ontologies such as WordNet. However, they can differ from each other subtly in both meaning and usage—consider the pair of nearsynonyms frugal and stingy—and therefore choosing the appropriate near-synonym for a given context is not a trivial problem. Initial work by Edmonds (1997) ...
متن کاملAcquiring Collocations For Lexical Choice Between Near-Synonyms
We extend a lexical knowledge-base of near-synonym differences with knowledge about their collocational behaviour. This type of knowledge is useful in the process of lexical choice between near-synonyms. We acquire collocations for the near-synonyms of interest from a corpus (only collocations with the appropriate sense and part-of-speech). For each word that collocates with a nearsynonym we us...
متن کاملAuditory Synaesthesia and Near Synonyms: A Corpus-Based Analysis of sheng1 and yin1 in Mandarin Chinese
This paper explores the nature of linguistic synaesthesia in the auditory domain through a corpus-based lexical semantic study of near synonyms. It has been established that the near synonyms 聲 sheng “sound” and 音 yin “sound” in Mandarin Chinese have different semantic functions in representing auditory production and auditory perception respectively. Thus, our study is devoted to testing wheth...
متن کاملDeriving Conceptual Structures from Sense: A Study of Near Synonymous Sensation Verbs
In Mandarin Chinese, lexical semantic relation of near synonyms is a widespread phenomenon, and is of great interest to many linguists. Most works deal with lexical semantic relation between lexical entries. This paper investigates the differences between Chinese near synonymous sensation verbs based on the data from “Academia Sinica Balanced Corpus of Modern Mandarin Chinese” (Sinica Corpus) a...
متن کاملWhat Can Near Synonyms Tell Us
This study examines near synonyms and tries to extract the contrasts that dictate their semantic and associated syntactic behaviors. A near synonym pair of Chinese verbs, fangbian and bianli, which mean ‘to be convenient’, is under examination. Corpus data reveal some important but opaque distributional differences between this synonym pair that are hard to be recognized solely by intuition. Th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007